Trees

 

 

So far we have learned about arrays, stacks, queues and linked lists, which are known as linear data structures. These are termed as linear because the elements are arranged in a linear fashion (that is, one-dimensional representation). Another very useful data structure is tree, where elements appear in a non-linear fashion, which require two-dimensional representation. Figure 8.1 is an example of such a representation.

 

 

Fig. 8.1  Tree — a non-linear representation of data.

 

        There are numerous examples where tree structure is the efficient means to maintain and manipulate data. In general, where the hierarchy relationship among data are to be preserved, tree is used. Figure 8.2 shows a family hierarchy of various members as a tree which maintains the relationship among them.

        This hierarchy not only gives the ancestors and successors of a family member, but some other information too. For example, if we assume left side member as female and right side member as male then sister, brother, uncle, aunt, grandfather, grandmother, etc. can also be implied automatically.

        As an another example, let us consider an algebraic expression

                                                                 X = (AB)/((C * D) + E)

where, different operators have their own precedence value. A tree structure to represent the same is shown in Figure 8.3. Here, operations having the highest precedence are in the lowest level; which operands are for which operator everything are stated explicitly. Moreover, associativity can easily be imposed if we place the operators at the left or right side.

        These two examples illustrate how powerful this data structure is to maintain a lot of information automatically. Not only these, there are another advantages of this data structure; insertion, deletion, searching, etc., are more efficient in trees than the linear data structures.

        Before going to study this data structure, let us introduce the basic terminologies of tree which will be referred to in subsequent discussions.

 

Fig. 8.2  A family hierarchy in the form of a tree.

 

 

Fig. 8.3  An algebraic expression in the form of a tree.

 

       

       

Basic terminologies

Node.   This is the main component of any tree structure. The concept of node is same as used in linked list. Node of a tree stores the actual data and links to the other node. Figure 8.4(a) represents the structure of a node.

Parent.   Parent of a node is the immediate predecessor of a node. Here, X is the parent of Y and Z. See Figure 8.4(b).

Child.   If the immediate predecessor of a node is the parent of the node then all immediate successors of a node are known as child. For example, in Figure 8.4(b), Y and Z are the two child of X. Child which is at the left side called the left child and that of at the right side is called the right child.

Link.  This is a pointer to a node in a tree. For example, as shown in Figure 8.4(a), LC and RC are two links of a node. Note that there may be more than two links of a node.

Root.  This is a specially designated node which has no parent. In Figure 8.4(c), A is the root node.

 

Fig. 8.4  A tree and its various components.

Leaf.  The node which is at the end and which does not have any child is called leaf node. In Figure 8.4(c), H, I, K, L, and M are the leaf nodes. Leaf node is also alternatively termed as terminal node.

Level.   Level is the rank of the hierarchy and root node is termed as in level 0. If a node is at level l then its child is at level l + 1 and parent is at level l – 1. This is true for all nodes except the root node. Level of the root node is zero. In Figure 8.4(c), levels of various nodes are depicted.

Height.  Maximum number of nodes that is possible in a path starting from root node to a leaf node is called the height of a tree. For example, in Figure 8.4(c), the longest path is ACFJM and hence the height of this tree is 5. It can be easily obtained that h = lmax + 1, where h is the height and lmax is the maximum level of the tree.

Degree.  Maximum number of children that is possible for a node is known as the degree of a node. For example, the degree of each node of the tree as shown in Figure 8.4(c) is 2.

Sibling.  The nodes which have the same parent are called siblings. For example, in
Figure 8.4(c), J and K are siblings.

        Different text use different term for the above said terms, such as depth for height, branch or edge for link, arity for degree, external node for leaf node and internal node for node other than leaf node.

Definition and Concepts

Let us define a tree. A tree is a finite set of one or more nodes such that:

        (i) there is a specially designated node called the root,

        (ii) remaining nodes are partitioned into n (n > 0) disjoint sets T1, T2, . . ., Tn, where each Ti (i = 1, 2, . . ., n) is a tree; T1, T2, . . ., Tn are called subtrees of the root.

        To illustrate the above definition, let us consider the sample tree as given in Figure 8.6.

 

 

 

Fig. 8.6  A sample tree T.


In the sample tree T, there is a set of 12 nodes. Here A is a special node being the root of the tree. Remaining nodes are partitioned into 3 sets T
1, T2, and T3; they are sub-trees of the root node A. By definition, each sub-tree is again a tree. Observe that a tree is defined recursively. A same tree can be expressed in a string notation as shown below:

                                    

 

Or more precisely,

                                                     T = (A(B(E, F(K, L))), C(G), D(H, I, J))

 

Binary Trees

A binary tree is a special form of a tree. Contrary to a general tree, a binary tree is more important and frequently used in various application of computer science. Likewise a general tree, a binary tree can also be defined as:

        A binary tree T is a finite set of nodes, such that

          (i)   T is empty (called empty binary tree), or

         (ii)   T contains a specially designated node called the root of T, and the remaining nodes of T form two disjoint binary trees T1 and T2 which are called left sub-tree and the right sub-tree respectively.

        Figure 8.7 depicts a sample binary tree.

 

 

Fig. 8.7  A sample binary tree with 11 nodes.

 

        One can easily notice the main difference between the definitions of tree and binary tree. A tree can never be empty but binary tree may be empty. Another difference is that in case of a binary tree a node may have at most two children (that is, a tree having degree = 2), whereas in case of a tree, a node may have any number of children.

        Two special situations of a binary tree are possible: full binary tree and complete binary tree.

Full binary tree

A binary tree is a full binary tree, if it contains maximum possible number of nodes in all level. Figure 7.8(a) shows such a tree with height 4.

 

 

Fig. 8.8  Two special cases of binary trees.

Complete binary tree

A binary tree is said to be a complete binary tree, if all its level, except possibly the last level, have the maximum number of possible nodes, and all the nodes at the last level appear as far left as possible. Figure 7.8(b) depicts a complete binary tree.

        Observe that binary tree as represented in Figure 8.7 is neither a full binary tree nor a complete binary tree.

 

Properties of Binary Tree

Binary tree possesses a number of properties. These properties are very much useful to study further. These properties are listed below as Lemmas 8.1–8.7.

 

Lemma 8.1  

      In any binary tree, maximum number of nodes on level l is 2l, where l ³ 0.

 

Proof:  The proof of the above lemma can be done by induction on l. The root is the only node on level l = 0. Hence, the maximum number of nodes on level l = 0 is 20 = 1 = 2l.

        Suppose, for all i, 0 £ i < l and for some l, the above formula is true, that is, the maximum number of nodes at level i is 2i. Since each node in a binary tree has degree 2, so from each node at level i, there may be at most 2 nodes at level i + 1. Thus, the maximum number of nodes at level i + 1 is 2 ´ 2i = 2i+1 Therefore, if the formula is true for any i, then it is also true for i + 1.

        Hence, the proof (see Figure 8.9).

 

 

Fig. 8.9  Binary tree showing Lemma 7.1.

 

 

Lemma 8.2 

      Maximum number of nodes possible in a binary tree of height h is 2h – 1.

 

Proof:  Maximum number of nodes is possible in a binary tree, if maximum number of nodes are present in each level. Thus, maximum number of nodes in a binary tree of height h is

                                   (where lmax is the maximum level of the tree)

                                            (using the formula of a geometric series)

                                     

From the definition of height, we have h = lmax + 1, hence we can write

                               nmax = 2h – 1                                                                                                                  (7.1)

 

Lemma 8.3 

      Minimum number of nodes possible in a binary tree of height h is h.     

 

Proof:  A binary tree has minimum number of nodes if each level has minimum number of nodes. Minimum nodes possible at every level is only one when every parent has one child. Such kind of trees called skew binary trees are shown in Figure 8.10.

 

 

Fig. 8.10  Skew binary trees containing minimum number of nodes.

 

        Thus, a skew binary tree is the tree having only one path which contains h number of nodes if h is its height. Hence nmin = h.

                                                                                                                                   

 

Lemma 8.4 

  For any non-empty binary tree, if n is the number of nodes and e is the number of edges, then n = e + 1.

 

Proof:  By induction method, we have

    Number of nodes, n                                                                Number of edge

                                                                                                        e                              n = e + 1

                    1                                                                                  0                             1 = 0 + 1

                    2                                                                                  1                             2 = 1 + 1

                    3                                                                                  2                             3 = 2 + 1

                    4                                                                                  3                             4 = 3 + 1

                     .                                                                                   .                                      .

                     .                                                                                   .                                      .

                     .                                                                                   .                                      .

                    n¢                                                                             n¢ – 1                    n¢ = (n¢ – 1) + 1

 

        Continuing this, let the above is true for any number of nodes n¢. Thus, n¢ = e¢ + 1.

        Now, if we add one more node into the binary tree having n¢ nodes, then it will increase one more edge in the binary tree. Thus,                                                                                       

                                                                       n¢ + 1 = (e¢ + 1) + 1

or

                                                                  n¢ + 1 = ((n¢ – 1) + 1) + 1

where e¢ = n¢ – 1, therefore

                                                                   n¢ + 1 = (n¢ + 1 – 1) + 1

This implies that, if the formula is true for any n, then it is also true for n + 1. Hence, n =
e
+ 1 is true.

 

Lemma 8.5 

     For any non-empty binary tree T, if n0 is the number of leaf nodes (degree = 0) and n2 is the number of internal node (degree = 2), then n0 = n2 + 1.

 

Proof  Suppose, n be the total number of nodes in T and ni be the number of nodes having degree i, 0 £ i £ 2. So, we have

                                                                           n = n0 + n1 + n2                                                                                                                                                   

as T is a binary tree and no other kind of nodes are possible.

        If e be the number of edge in T then,

                                                                 e = n0 ´ 0 + n1 ´ 1 + n2 ´ 2

or

                                                                              e = n1 + 2n2                                                                                                                                                         

Again, from Lemma 7.4, we have

                                                                                n = e + 1                                                                                                                                                             

Thus, from (ii) and (iii), we can write

                                                                          n = 1 + n1 + 2n2                                                                                                                                                  

And from (i) and (iv), we have

                                                                  n0 + n1 + n2 = 1 + n1 + 2n2

or

                                                                               n0 = n2 + 1                                                                           

Hence the result.

 

Lemma 8.6 

     Height of a complete binary tree with n number of nodes is élog2(n + 1)ù.

 

Proof  Let h be the height of the complete binary tree. Then we can write the following inequality:

                                                                 n £ 20 + 21 + 22 + . . . + 2h–1

or

                                                                                                      n £ 2h – 1                                                                                            

or

                                                                                2h ³ n + 1

Taking logarithm on both sides, we get

                                                                           h ³ log2(n + 1)

or

                                                                    h = élog2(n + 1)ù                                                                       

Hence, the result.

 

 

Lemma 8.7 

     Total number of binary tree possible with n nodes is

                                                                                                                                                                               

Proof  Proof of this lemma is beyond the scope of this book;  

 

Representation of Binary Tree

A (binary) tree must represent a hierarchical relationship between a parent node and (at most two) child nodes. There are two common methods used for representing this conceptual structure. One is implicitly: where we do not require the overhead of maintaining pointers (links) is called linear (or sequential) representation, using an array. The other is explicitly: known as linked representation, using pointers. Whatsoever be the representation, the main objective is that one should have direct access to the root node of the tree, and for given any node, one should have direct access to the children of it.

Linear Representation of a Binary Tree

This type of representation is static in the sense that a block of memory for an array is to be allocated before going to store the actual tree in it, and once the memory is allocated, the size of the tree will be restricted as the memory permits.

        In this representation, the nodes are stored level by level, starting from the zero level where only root node is present. Root node is stored in the first memory location (as the first element in the array).

        Following rules can be used to decide the location of any node of a tree in the array (assuming that array index starts from 1):

          1.   The root node is at location 1.

          2.   For any node with index i, 1 < i £ n, (for some n):

                (a)   PARENT(i) =                                                                                                            

                        For the node when i = 1, there is no parent.

                (b)   LCHILD(i) = 2 * i                                                                                                              

                        If 2 * i > n, then i has no left child.

                (c)   RCHILD(i) = 2 * i + 1                                                                                                       

                        If 2 * i + 1 > n, then i has no right child.

For an example, let us consider the case of representation of the following expression in the form of a tree:

                                                                       (AB) + C * (D/E)

The binary tree will appear as in Figure 8.11(a). The representation of the same tree using array is shown in Figure 8.11(b).

 

Fig. 8.11  Sequential representation of a binary tree.

 

        Next question arises is how can be the size of an array estimated? This value can be obtained easily if the binary tree is a full binary tree. As we know, from Lemma 2, a binary tree of height h can have at most 2h – 1 nodes. So, the size of the array to fit such a binary tree is 2h – 1.

        To understand this, a full binary tree and the index of its various nodes when stored in an array is illustrated as shown in Figure 8.12.

 

 

Fig. 8.12  A full binary tree of height = 4 and label of the nodes represents array locations of nodes.

 

        One can easily realize the fact that if the tree is a full binary tree then there will be an efficient use of storage in the array representation (no array location will be left empty); on the other hand, for a binary tree other than full binary tree there is a wastage of memory.

        Lemma 8.8 gives us the estimation about the maximum and minimum size of the array to store a binary tree with n nodes.

 

Lemma 8.8 

     Maximum and minimum size that an array may require to store a binary tree with n number of nodes are

                                                                                                    Sizemax = 2n – 1                                                                                                                                                            

                                                                                                                               Sizemin = 2 élog2(n+1)ù 1                                                     

 

Proof : If the height of a binary tree denotes the maximum number of nodes in the longest path of the tree, then for a tree with n nodes, the maximum height possible is hmax = n. Such a binary tree is termed as skew binary tree (see Figure 8.13).

 

Fig. 8.13  A skew binary tree with maximum height.

 

        If we store such a binary tree, then it can be seen that the first location is for the root node, third location (22 – 1) for right child of root node (second node) 7th (23 – 1) location for the right child of the second node (third node), and so on. Thus, for a node at level i—it is actually (i + 1)-th node—its location in the array is at 2i+1 – 1. So, the last node (n-th node) which is at (n – 1)-th level will be at 2n – 1. This is therefore the maximum size of the array. Hence,

 

                                                    Sizemax = 2n – 1                                                                                    

 

        Now, when the tree is a full (or complete) binary tree, then we need minimum sized array to accommodate the entire tree. In the case of a full or complete binary tree, with n nodes, minimum height that is possible, is hmin = élog2(n + 1)ù. In that case, the last element will be stored at -th location. Hence the minimum size required is

                                                                                                                              

Note:  The maximum and minimum size of an array required to store a binary tree can be expressed in general as:

                                                                                                Size = 2h – 1                                                                                          

where h be the height of the binary tree. Here, if h is minimum then minimum sized array and if h is maximum then maximum sized array will be computed.

 

 

Advantages of sequential representation of binary tree

Advantages of the sequential representation of a binary tree are:

       1.   Any node can be accessed from any other node by calculating the index and this is efficient from execution point of view.

       2.   Here, data are stored only without any pointers to their successor or ancestor which are mentioned implicitly.

       3.   Programming languages, where dynamic memory allocation is not possible (such as BASIC, FORTRAN), array representation is the only means to store a tree.

Disadvantages of sequential representation of binary tree    

With these potential advantages, there are disadvantages as well. These are:

       1.   Other than full binary trees, majority of the array entries may be empty.

       2.   It allows only static representation. It is no way possible to enhance the tree structure if the array size is limited.

       3.   Inserting a new node to it or deleting a node from it are inefficient with this representation, because these require considerable data movement up and down the array which demand excessive amount of processing time.                 

Linked Representation of Binary Tree

In spite of simplicity and ease of implementation, linear representation of binary trees has a number of overheads. In linked representation of binary trees, all these overheads are taken care of. Linked representation assumes the structures of a node as shown in Figure 8.15.

 

 

 

Fig. 8.15  Structure of a node in linked representation.

 

        Here LC and RC are two link fields to store the address of left child and right child of a node; DATA is the information content of the node. With this representation, if one knows the address of the root node then from it any other node can be accessed. The two forms, that is, ‘tree’ structure and ‘linked’ structure looks almost similar. This similarity implies that the linked representation of a binary tree very closely resembles the logical structure of the data involved. This is why, all the algorithms for various operations with this representation can be easily defined.

        Using the linked representation, the tree with 9 nodes given in Figure 8.16(a) will appear as shown in Figure 8.16(b)–(c).

 

 

Fig. 8.16  Linked representation of a binary tree.

 

        Another advantage of this representation is that it allows dynamic memory allocation; hence size of the tree can be changed as and when need arises without any limitation except the limitation of the availability of the total memory.

        However, so far memory requirement for a tree is concerned, it uses more memory than that the linear representation. Linked representation requires extra memory to maintain the pointers. Some pointers though with null values, still it needs to store them. Lemma 8.9 estimates the number of such null links in a binary tree.

 

Lemma 8.9 

   In a linked representation of a binary tree, if there are n number of nodes then the number of null links l = n + 1.

 

Proof:  By the method of induction. Let R be the pointer to the root node of a binary tree T. If T is empty, that is n = 0 then R = null, thus the number of null link l = 0 + 1 = n + 1. For a single node, that is n = 1, l = 2 = 1 + 1 = n + 1. Similarly, for a tree T having two nodes, l = 3 = 2 + 1 = n + 1. Here it is true for n = 2. Thus, the formula is true for n = 0, 1 and 2 (see Figure 8.17). Let it be true for any n = m. Therefore, we can write l = m + 1. Now, if we add one more node into T with m nodes, then number of nodes will be n¢ = m + 1. Addition of the new node changes one null link into a non-null value whereas it incorporates two null links of its own; thus the net increase in null link by this new node is 1. Hence we have
l
= (m + 1) + 1 or l = n¢ + 1. This shows that if the formula is true for m then it is also true for m + 1. As m is an arbitrary number, the above relation is proved.

 

 

Fig. 8.17  Induction for l = n + 1 when n = 0, 1 and 2

Physical Implementation of Binary Tree in Memory

In this section, we will see how simple it is to implement a binary tree either using sequential (array) or linked representation. In fact, implementation is just a straightforward translation from definition of a binary tree to algorithmic description.

        The algorithm BuildTree_1(...) is to build a tree using an array of element type DATA and the algorithm BuildTree_2 (...) is to build a tree using linked structure. Both the algorithm will build a tree with user interaction.

        Algorithm BuildTree_1 (...) assumes an array A of suitable size to store all the data elements in a binary tree. Algorithm BuildTree_2 (...) assumes the node structure which is same as stated in Section 7.3.2. Both the algorithms follow recursive definition.

 

Algorithm BuildTree_1                                              

Input:  ITEM is the data content of the node I.                                     //A binary tree currently with node at I

Output:  A binary tree with two sub-trees of node I.

Data structure:  Array representation of tree.

Steps:

1.

If (I ¹ 0) then                                                                                                  // If the tree is not empty

2.

 

A[I] = ITEM                                                  // Store the content of the node I into the array A

3.

 

Node I has left sub-tree (Give option = Y/N)?

4.

 

If (option = Y) then                                                                               // If node I has left sub tree

5.

 

 

BuildTree_1 (2 * I, NEWL)                  // Then it is at 2*I with next item as NEWL

6.

 

Else

7.

 

 

BuildTree_1 (0, NULL)                                                                      // Empty sub-tree

8.

 

EndIf

9.

 

Node I has right sub-tree (Give option = Y/N)?

10.

 

If (option = Y)                                                                               // If node I has right sub-tree

11.

 

 

BuildTree_1 (2*I+1, NEWR)  //Then it is at 2*I+1 with next item as NEWR

12.

 

Else

13.

 

 

BuildTree_1 (0, NULL)                                                                      // Empty sub-tree

14.

 

EndIf

15.

EndIf

16.

Stop

 

 

Algorithm BuildTree_2

Input:  ITEM is the content of the node with pointer PTR.

Output:  A binary tree with two sub-trees of node PTR.

Data structure:  Linked list structure of binary tree.

Steps:

1.

If (PTR ¹ NULL) then                                                                                  // If the tree is not empty

2.

 

PTR→DATA = ITEM                                                           // Store the content of node at PTR

3.

 

Node PTR has left sub-tree (Give option = Y/N)?

4.

 

If (option = Y) then

5.

 

 

lcptr = GetNode(NODE)                                       // Allocate memory for the left child

6.

 

 

PTR→LC = lcptr                                                                                // Assign it to Left link

7.

 

 

BuildTree_2 (lcptr, NEWL)                  // Build left sub-tree with next item as NEWL

8.

 

Else

9.

 

 

lcptr = NULL

10.

 

 

PTR→LC = NULL                                                  // Assign for an empty left sub-tree

11.

 

 

BuildTree_2(lcptr, NULL)                                                                  // Empty sub-tree

12.

 

EndIf

13.

 

Node PTR has right sub-tree (give option = Y/N)?

14.

 

If (option = Y) then

15.

 

 

rcptr = GetNode (NODE)                                    // Allocate memory for the right child

16.

 

 

PTR→RC = rcptr                                                                      // Assign it to Right link

17.

 

 

BuildTree_ 2(rcptr, NEWR)                // Build right sub-tree with next item as NEWR

18.

 

Else

19.

 

 

rcptr = NULL

20.

 

 

PTR→RC = NULL                                                // Assign for an empty right sub-tree

21.

 

 

BuildTree_ 2(rcptr, NULL)

22.

 

EndIf

23.

EndIf

24.

Stop

 

Operations on Binary Tree

Major operations on a binary tree can be listed as:

        1. Insertion.  To include a node into an existing (may be empty) binary tree.

        2. Deletion.  To delete a node from a non-empty binary tree.

        3. Traversal.  To visit all the nodes in a binary tree.

        4. Merge.  To merge two binary trees into a larger one.

Some other special operations are also known for special cases of binary trees. These will be mentioned later on. Now, let us discuss the above mentioned operations for general binary trees.

Insertion

With this operation, a new node can be inserted into any position in a binary tree. Inserting a node as an internal node is generally based on some criteria which is usually in context of a special form of a binary tree. We will discuss here a simple case of insertion as external nodes. Figure 8.18 shows how a node containing data content as G can be inserted as a left child of a node having data content E. Two algorithms for two different storage of representations (viz., sequential storage and linked storage) are stated below.

 

 

Fig. 8.18  Insertion of a node as an external node into a binary tree.

 

        Insertion procedure, in fact, is a two-step process: first to search for the existence of a node in the given binary tree after which an insertion to be made, and second is to establish a link for the new node.

Insertion into a sequential representation of a binary tree (as a leaf node)

The following algorithm uses a recursive procedure Search_SEQ to search for a node containing data as KEY.

 

Algorithm InsertBinaryTree_SEQ

Input:  KEY be the data of a node after which a new node has to be inserted with data as ITEM.

Output:  Newly inserted node with data as ITEM as a left or right child of the node with data KEY.

Data structure:  Array A storing the binary tree.

Steps:

1.

l = Search_SEQ(1, KEY)                                                                // Search for the key node in the tree

2.

If (l = 0) then

3.

 

Print “Search is unsuccessful : No insertion”

4.

 

Exit

5.

EndIf                                                                                                                     // Quit the execution

6.

If (A[2 * l] = NULL) or (A[2 * l + 1] = NULL) then                    // If the key node has empty link(s)

7.

 

Read option to read as left (L) or right (R)                                     // child (give option = L/R)

8.

 

If (option = L) then

9.

 

 

If A[2 * l] = NULL then                                                                  // Left link is empty

10.

 

 

 

A[2 * l] = ITEM                                                               // Store it in the array A

11.

 

 

 Else                                                                             // Cannot be inserted as left child

12.

 

 

 

Print “Desired insertion is not possible”               // as it already has a left child

13.

 

 

 

Exit                                                               // Return to the end of the procedure

14.

 

 

EndIf

15.

 

EndIf

16.

 

If (option = R) then                                                                              // Move to the right side

17.

 

 

If (A[2 * l + 1] = NULL) then                                                       // Right link is empty

18.

 

 

 

A[2 * l + 1] = ITEM                                                         // Store it in the array A

19.

 

 

Else                                                                           // Cannot be inserted as right child

20.

 

 

 

Print ‘Desired operation is not possible’                //as it already has a left child

21.

 

 

 

Exit                                                             // Return to the end of the procedures

22.

 

 

EndIf

23.

 

EndIf                                                                                // Key node is having both the child

24.

Else                                                                                     // Key node does not have any empty link

25.

 

Print “ITEM cannot be inserted as a leaf node”

26.

EndIf

27.

Stop

 

        The procedure Search_SEQ(. . .) can be defined recursively as below:

Algorithm Search_SEQ

Input:   KEY be the item of search, INDEX being the index of the node from where searching will start.

Output:  LOCATION, where the item KEY is located, if any.

Data structure:  Array A storing the binary tree. SIZE denotes the size of A.

Steps:

1.

 i = INDEX                                                                                                     // Start from root node

2.

If (A[i] ¹ KEY) then                                                                  // The present node is not the key node

3.

 

If (2 * i £ SIZE) then                                                         // If left sub-tree is not exhausted

4.

 

 

Search_SEQ(2 * i, KEY)                                                // Search in the left sub-tree

5.

 

Else                                                             // Left sub-tree is exhausted and KEY not found

6.

 

 

If (2 * i + 1 £ SIZE) then                                      // If right sub-tree is not exhausted

7.

 

 

 

Search_SEQ (2 * i + 1, KEY)                            // Search in the right sub-tree

8.

 

 

Else                                       // The KEY is found neither in left nor in right sub-tree

9.

 

 

 

Return(0)                               // Return NULL address for unsuccessful search

10.

 

 

EndIf

11.

 

EndIf

12.

Else

13.

 

Return(i)                                                          // Return the address where KEY is found

14.

EndIf

15.

Stop

 

 

 

 

Insertion into a linked representation of a binary tree (as a leaf node)

Let us assume the node structure used in linked representation.

 

Algorithm InsertBinaryTree_LINK

Input:  KEY, the data content of the key node after which a new node is to be inserted and ITEM is the data

           content of the new node that has to be inserted.

Output:  A node with data component ITEM inserted as an external node after the node having data as

               KEY if such a node exist with empty link(s), that is, either or both children is/are absent.

Data structure:  Linked structure of binary tree. ROOT is the pointer to the root node.

Steps:

1.

ptr = Search_LINK(ROOT, KEY)

2.

If (ptr = NULL) then

3.

 

Print “Search is unsuccessful: No insertion”

4.

 

Exit

5.

EndIf

6.

If (ptrLC = NULL) or (ptrRC = NULL)                          // If either or both link(s) is/are empty

7.

 

Read option to insert as left (L) or right (R) child (give option = L/R)

8.

 

If (option = L) then                                                                               // To insert as left child

9.

 

 

If (ptrLC = NULL) then                                    // If the left link is empty then insert

10.

 

 

 

new = GetNode(NODE)

11.

 

 

 

newDATA = ITEM

12.

 

 

 

newLC = newRC = NULL

13.

 

 

 

ptrLC = new

14.

 

 

Else                                                                        // The key node already has left child

15.

 

 

 

Print “Insertion is not possible as left child”

16.

 

 

 

Exit                                                                                        // Quit the execution

17.

 

 

EndIf

18.

 

Else                                                                                                                    // If option = R)

19.

 

 

If (ptrRC = NULL)                                          // If the right link is empty then insert

20.

 

 

 

new = GetNode (NODE)

21.

 

 

 

newDATA = ITEM

22.

 

 

 

newLC = newRC = NULL

23.

 

 

 

ptrRC = new

24.

 

 

Else                                                                      // The key node already has right child

25.

 

 

 

Print “Insertion is not possible as right child”

26.

 

 

 

Exit                                                                                       // Quit the execution

27.

 

 

EndIf

28.

 

Else

29.

 

 

Print “The key node already has child”   //The key node has no empty child

30.

 

EndIf

31.

EndIf

32.

Stop

 

This algorithm uses a procedure Search_LINK to search for a node containing data as KEY, this procedure can be defined recursively as below:

 

Algorithm Search_LINK

Input:   KEY, the item of search, PTR0 is the pointer to the linked list from where the search will start.

Output:  LOCATION, where the item KEY is located, if such ITEM exists.

Data structure:  Linked structure of binary tree having ROOT as the pointer to the root node.

Steps:

1.

ptr = PTR0                                                                                                    // Start from a given node

2.

If (ptrDATA ¹ KEY)                                                              // If the current node is not the key node

3.

 

If (ptrLC ¹ NULL)                                                                    // If the node has left sub-tree

4.

 

 

Search_LINK(ptrLC)                                                          // Search the left sub-tree

5.

 

Else                                                                                            // Key is not in the left sub-tree

6.

 

 

Return(0)

7.

 

EndIf

8.

 

If (ptrRC ¹ NULL)                                                                 // If the node has right sub-tree

9.

 

 

Search_LINK(ptrRC)                                                      // Search the right sub-tree

10.

 

Else                                                                                         // Key is not in the right sub-tree

11.

 

 

Return(0)                                                                                         // Return null pointer

12.

 

EndIf

13.

Else

14.

 

Return(ptr)                                                                        // Return the pointer to the key node

15.

EndIf

16.

Stop

 

               

Deletion

This operation is used to delete any node from any non-empty binary tree. Likewise the insertion, deletion operation also varies from one kind of binary trees to another. Deletion operations in various cases of binary trees will be discussed in due time. Here, we will consider the case of deletion of leave node in any binary tree. Figure 8.19 shows how a node containing data as G can be deleted from a binary tree.

 

 

 

Fig. 8.19  Deletion of an external node in a binary tree.

        Let us see how the deletion operation can be carried out for two kind of storage representation of binary trees. In order to delete a node in a binary tree, it is required to reach at the parent node of the node to be deleted. The link field of the parent node which stores the location of the node to be deleted is then set by a NULL entry. Let us discuss the algorithm individually.

Deleting a leave node from sequential representation of binary trees

Let SIZE be the total length of the array A, where binary tree is stored. The element that has to be deleted is ITEM.

 

Algorithm DeleteBinTree_SEQ

Input:  Given ITEM as data of the node with which the node can be identified for deletion.

Output:  A binary tree without a node having data as ITEM.

Data structure:  Array A storing the binary tree. SIZE denotes the size of A.

Steps:

1.

flag = FALSE                                                                                                             // Start from root node

2.

l = Search_SEQ(1, KEY)                                                                 // Start searching from starting index

3.

If (A[2 * l] = NULL) and (A[2 * l + 1] = NULL)                                          // Test for the leave node

4.

 

flag = TRUE                                                                                                // If so, then delete it

5.

 

A[l] = NULL

6.

Else

7.

 

Print “The node containing ITEM is not a leave node”

8.

 

 

9.

EndIf

10.

If (flag = FALSE)

11.

 

Print “Node does not exist : No deletion”

12.

EndIf

13.

Stop

 

Deleting a leave node from a linked representation of binary tree

Let ROOT be the pointer to the linked list storing a binary tree. The node that has to be deleted contains ITEM as data.

 

Algorithm DeleteBinTree_LINK

Input:   A binary tree whose address of the root pointer is known from ROOT and ITEM is the data of the

             node identified for deletion.

Output:  A binary tree without having a node with data as ITEM.

Data structure:  Linked structure of binary tree having ROOT as the pointer to the root node.

Steps:

1.

ptr = ROOT                                                                                                // Start search from the root

2.

If (ptr = NULL) then

3.

 

Print “Tree is empty”

4.

 

Exit                                                                                                               // Quit the execution

5.

EndIf

6.

parent = SearchParent(ROOT, ITEM)                                    // To find the parent of the desired node

7.

If(parent ¹ NULL) then                                                                             // If the node with ITEM exists

8.

 

ptr1 = parentLC, ptr2 = parent→RC                                     // Get the sibling of parent node

9.

 

If(ptr1→DATA = ITEM) then           // Choose the left sibling with data as ITEM for deletion       

10.

 

 

If(ptr1LC = NULL) and (ptr1RC = NULL) then

11.

 

 

 

parentLC = NULL                                                           // Left child is deleted

12.

 

 

Else

13.

 

 

 

Print “Node is not a leave node: No deletion”

14.

 

 

EndIf

15.

 

Else                                                    // Choose the right sibling with data as ITEM for deletion

16.

 

 

If(ptr2LC = NULL) and (ptr2RC = NULL) then

17.

 

 

 

parentRC = NULL                                                         // Right child is deleted

18.

 

 

Else

19.

 

 

 

Print “Node is not a leave node: No deletion”

20.

 

 

EndIf

21.

 

EndIf

22.

Else

23.

 

Print “Node with data ITEM does not exist: Deletion fails”

24.

EndIf

25.

Stop

 

        This algorithm uses a procedure SearchParent to search the parent of a node containing KEY as data. This procedure can be defined recursively as below:

 

Algorithm SearchParent

Input:   ITEM is the item of search, PTR is the pointer to the node from where the search will start.

Output:  ‘parent’ is the address of the parent node of the node containing the data as ITEM.

Data structure:  Linked structure of binary tree.

Steps:

1.

parent = PTR

2.

If (ptrDATA ¹ ITEM) then

3.

 

ptr1 = ptrLC, ptr2 = ptrRC

4.

 

If (ptr1 ¹ NULL) then                                                                     

5.

 

 

SearchParent(ptr1)                                                             // Search in the left sub-tree

6.

 

Else                                                                                            // Key is not in the left sub-tree

7.

 

 

parent = NULL                      // ITEM is not in left sub-tree, NULL pointer is returned

8.

 

EndIf

9.

 

If (ptr2 ¹ NULL) then                                              

10.

 

 

SearchParent (ptr2)                                                          // Search in the right sub-tree

11.

 

Else                                                                                         // Key is not in the right sub-tree

12.

 

 

parent = NULL                      // ITEM is not in right sub-tree, NULL pointer is returned

13.

 

EndIf

14.

Else

15.

 

Return(parent)                                                   // Return the pointer to the parent node, if any

16.

EndIf

17.

Stop

 

Traversals

Traversal operation is a frequently used operation on a binary tree. This operation is used to visit each node in the tree exactly once. A full traversal on a binary tree gives a linear ordering of the data in the tree. For an example, if the binary tree contains an arithmetic expression then its traversal may give us the expression in infix notation, postfix notation or prefix notation.

        Now a tree can be traversed in various ways. For a systematic traversal, it is better to visit each node (starting from the root) and its subtrees in the same fashion. There are six such possible ways:

 

        1.       R           Tl                    Tr                                                            4.           Tr                Tl                R

        2.       Tl                R             Tr                                                            5.           Tr                R           Tl

        3.       Tl                Tr                    R                                        6.           R           Tr                Tl

 

Here T1 and Tr denote the left and right sub-trees of the node R, respectively. Consider a binary tree as shown in Figure 8.21.

 

 

 

Fig. 8.21  A binary tree representing an arithmetic expression.

        Visit 1:

                                                                + Tl+Tr+

                                                                + - Tl–Tr–Tr+

                                                                + - A TlATrATr–Tr+

                                                                + - A B TlBTrBTr+

                                                                + - A B * Tl*Tr*

                                                                + - A B * C TlCTrCTr*

                                                                + - A B * C / Tl/Tr/

                                                                + - A B * C / E TlETrETr/

                                                                + - A B * C / E F TlFTrF

                                                                + - A B * C / E F

        Likewise, one can obtain the other visits as shown below (only the result):

        Visit 2:                                                                                                                                                          

                                                                         AB + C * E / F

        Visit 3:                                                                                                                                                          

                                                                         A BC E F / * +

        Visit 4:

                                                                         F E / C * B A – +

        Visit 5:

                                                                         F / E * C + BA

        Visit 6:

                                                                         + * / F E CB A

        It can be noted that Visit 1 and Visit 4 are mirror symmetric; similarly, Visit 2 with
Visit 5 and Visit 3 with Visit 6. So, out of six possible traversals, only three are fundamental, they are categorized as given below:

          1.   R Tl Tr                          (Preorder)

          2.   Tl R Tr                          (Inorder)

          3.   Tl Tr R                 (Postorder)

Preorder traversal

In this traversal, root is visited first then left sub-tree in preorder fashion and then right sub-tree in preorder fashion. It can be defined as below:

   Ÿ   Visit the root node R.

   Ÿ   Traverse the left sub-tree of R in preorder.

Ÿ Traverse the right sub-tree of R in preorder.

Inorder traversal

With this traversal, before visiting the root node, left sub-tree of the root node is to be visited then root node and after the visit of the root node right sub-tree of the root node is visited. Visiting both the sub-trees are in the same fashion as the tree itself. Such traversal can be stated as below:

   Ÿ   Traverse the left sub-tree of the root node R is inorder.

   Ÿ   Visit the root node R.

   Ÿ   Traverse the right sub-tree of the root node R is inorder.

Postorder traversal

Here, root node is visited at the end, that is, first visit the left sub-tree, then right sub-tree, and at last is the root. Definition for this type of tree traversal is stated below:

   Ÿ   Traverse the left sub-tree of the root R in postorder

   Ÿ   Traverse the right sub-tree of the root R in postorder

   Ÿ   Visit the root node R.

        Observe that each algorithm contains three steps out of which two steps are defined recursively. In preorder, root is to be visited first (pre), in inorder, root is to be visited in between left sub-tree and right sub-tree (in) and in postorder, root is to be visited after left sub-tree and right sub-tree are visited (post). Also, it can be noted that, if the binary tree contains an arithmetic expression, then its inorder, preorder and postorder traversals produces  infix, prefix, and postfix expressions, respectively.

        Next, let us consider the implementation of the above traversals. We will define three algorithms assuming that binary tree is represented using linked structure. For sequential representation of binary tree these can be defined analogously.

        Let us assume a process Visit(N) to imply some operation (say, display on the screen, increasing the number of node count etc.) while visiting the node N .

 

Inorder traversal of a binary tree

Recall that inorder traversal of a binary tree follows three ordered steps as:

   Ÿ   Traverse the left sub-tree of the root node R in inorder

   Ÿ   Visit the root node R

   Ÿ   Traverse the right sub-tree of the root node R in inorder.

Out of these steps, steps 1 and 3 are defined recursively. Following is the algorithm Inorder to implement the above definition.

 

Algorithm Inorder

Input:  ROOT is the pointer to the root node of the binary tree.

Output:  Visiting of all the nodes in inorder fashion.

Data structure:  Linked structure of binary tree.

Steps:

1.

ptr = ROOT                                                                                                              // Start from ROOT

2.

If (ptr ¹ NULL) then                                                                                   // If it is not an empty node

3.

 

Inorder(ptr→LC)                                         // Traverse the left sub-tree of the node in inorder

4.

 

Visit(ptr)                                                                                                             // Visit the node

5.

 

Inorder (ptr→RC)                                      // Traverse the right sub-tree of the node in inorder

6.

EndIf

7.

Stop

 

Preorder traversal of a binary tree

The definition of preorder traversal of a binary tree, as already discussed which is further presented as below.

   Ÿ  Visit the root node R.

   Ÿ  Traverse the left sub-tree of the root node R in preorder.

   Ÿ  Traverse the right sub-tree of the root node R in preorder.

The algorithm Preorder is presented to precise the above definition as below:

 

Algorithm Preorder

Input:  ROOT is the pointer to the root node of the binary tree.

Output:  Visiting of all the nodes in preorder fashion.

Data structure:  Linked structure of binary tree.

Steps:

1.

ptr = ROOT                                                                                                        // Start from the ROOT

2.

If (ptr ¹ NULL) then                                                                                   // If it is not an empty node

3.

 

Visit(ptr)                                                                                                              // Visit the node

4.

 

Preorder(ptr→LC)                                     // Traverse the left sub-tree of the node in preorder

5.

 

Preorder(ptr→RC)                                   // Traverse the right sub-tree of the node in preorder

6.

EndIf

7.

Stop

 

Postorder traversal of a binary tree

The definition of postorder traversal of a binary tree is as given below:

   Ÿ   Traverse the left sub-tree of the root node R in postorder.

   Ÿ   Traverse the right sub-tree of the root node R in postorder.

   Ÿ   Visit the root node R

The algorithm to implement the above is stated as below:

 

Algorithm Postorder

Input:  ROOT is the pointer to the root node of the binary tree.

Output:  Visiting of all the nodes in preorder fashion.

Data structure:  Linked structure of binary tree.

Steps:

1.

ptr = ROOT                                                                                                              // Start from the root

2.

If (ptr ¹ NULL) then                                                                                    // If it is not an empty node

3.

 

Postorder(ptr→LC)                                        // Traverse the left sub-tree of the node in inorder

4.

 

Postorder(ptr→RC)                                    // Traverse the right sub-tree of the node in inorder

5.

 

Visit(ptr)                                                                                                               // Visit the node

6.

EndIf

7.

Stop

 

Formation of binary tree from its traversals

Sometime it is required to construct a binary tree if its traversals are known. From a single traversal, it is not possible to construct unique binary tree, however, if two traversals are known then corresponding tree can be drawn uniquely. We will examine these possibilities and then chalk out the algorithms for the same. Basic principle for the formation is stated as below:

   Ÿ   If the preorder traversal is given, then the first node is the root node. If the postorder traversal is given then the last node is the root node.

   Ÿ   Once the root node is identified, all the node in the left sub-trees and right sub-trees of the root node can be identified.

   Ÿ   Same technique can be applied repeatedly to form sub-trees.

It can be noted that, for the purpose mentioned, two traversals are essential out of which one should be inorder traversal and another preorder or postorder; alternatively, given preoder and postorder traversals, binary tree cannot be obtained uniquely.

        As illustrations, let us consider following two examples.

Example 8.1:   Suppose, inorder and preorder traversals of a binary tree are as below:

        Inorder                        D        B        H       E        A        I         F        J        C        G

        Preorder                      A        B        D       E        H        C        F        I         J         G

We have to construct the binary tree. The following steps are to be followed.

          1.   From the preorder traversal, it is evident that A is the root node.

          2.   In inorder traversal, all the nodes which are at the left side of A belong to the left sub-tree and those are at right side of A belong to the right sub-tree.

          3.   Now the problem reduces to form sub-trees and the same procedure can be applied repeatedly. Figure 8.23 illustrates these.

 

 

 

Fig. 8.23  Formation of binary tree from its inorder and preorder traversals.

 

Example 8.2:     Following is the inorder and postorder traversals of a binary tree:

        Inorder                n1       n2        n3       n4        n5       n6       n7        n8        n9

        Postorder            n1       n3        n5       n4        n2       n8       n7        n9        n6

We are to draw the corresponding binary tree. Construction of binary tree is shown in Figure 8.24.

 

 

Fig. 8.24  Construction of binary tree from its inorder and postorder traversals.

  

        Now, let us define the procedure to construct a binary tree when its two essential traversals are known. Following is the algorithm InPreTree to form a binary tree from its inorder and preorder traversals. Let us assume that two traversals are stored in two arrays IN and PRE where the number of nodes is N. The algorithm given below is a recursive definition. The initial values for the recursion call are I1 = 1, I2 = N, P1 = 1, P2 = N are pointers to the root node. Figure 8.25 will help to understand the above recursive definition:

 

 

Fig. 8.25  Recursive approach of constructing a binary tree from inorder and preorder traversals.

Merging of Two Binary Trees

Another fundamental operation that is possible is the merge operation. This operation is applicable for trees which are represented using linked structure. There are two ways that this operation can be carried out.

        Suppose, T1 and T2 are two binary trees. T2 can be merged with T1 if all the nodes from T2, one by one, is inserted into the binary tree T1 (insertion may be as internal node when it has to maintain certain property or may be as external nodes).

        Another way, when the entire tree T2 (or T1) is included as a sub-tree of T1 (or T2). For this, obviously we need that in either (or both) tree there must be at least one null sub-tree. We will consider in our subsequent discussion, this second case of merging.

        Before, going to perform the merging, we have to test for compatibility. If in both the trees, root node have both the left and right sub-trees then merge will fail; otherwise if T1 has left sub-tree (or right sub-tree) empty then T2 will be added as the left sub-tree (or right sub-tree) of the T1 and vice versa. For example, as in Figure 8.27, T1 has right sub-tree empty hence T2 is added as the right sub-tree of T1. Note that if T(N) denotes the number of nodes n in tree T then

                                                               T(n1 + n2) = T1(n1) + T2(n2)

where T is the resultant tree of merging T1 and T2.

 

Fig. 8.27  Merging of two binary trees.

 

        Following is the algorithm MERGE to define the merge operation.

Algorithm MergeTrees

Input:  Two pointers ROOT1 and ROOT2 are the roots of the two binary tree T1 and T2, respectively with

            linked structure.

Output:  A binary tree containing all the nodes of T1 and T2 having pointer to the root as ROOT.

Data structure:  Linked structure of binary trees are considered.

Steps:

1.

If (ROOT1 = NULL) then                                                           // If T1 is empty then T2 is the result

2.

 

ROOT = ROOT2                                                                                       // Tree T2 is the result

3.

 

Exit

4.

Else

5.

 

If (ROOT2 = NULL) then                                                    // If T2 is empty then T1 is a result

6.

 

 

ROOT = ROOT1

7.

 

 

Exit

8.

 

Else                                                                                                 // If both trees are non empty

9.

 

 

If (ROOT1LCHILD = NULL) then

10.

 

 

 

ROOT1LCHILD = ROOT2                        // Merge T2 as the left child of T1

11.

 

 

 

ROOT = ROOT1

12.

 

 

Else

13.

 

 

 

If (ROOT1RCHILD = NULL) then

14.

 

 

 

 

ROOT1→RCHILD = ROOT2              // Merge T2 as the right child of T1

15.

 

 

 

 

ROOT = ROOT1

16.

 

 

 

Else

17.

 

 

 

 

If (ROOT2LCHILD = NULL) then

18.

 

 

 

 

 

ROOT2LCHILD = ROOT1   // Merge T1 as the left child of T2

19.

 

 

 

 

 

ROOT = ROOT2

20.

 

 

 

 

Else

21.

 

 

 

 

 

If (ROOT2RCHILD = NULL) then

22.

 

 

 

 

 

 

ROOT2RCHILD = ROOT1  // Merge T1 as the right child of T2

23.

 

 

 

 

 

 

ROOT = ROOT2

24.

 

 

 

 

 

Else

25.

 

 

 

 

 

 

Print “Trees are not compatible for merge operation”

26.

 

 

 

 

 

 

Exit

27.

 

 

 

 

 

EndIf

28.

 

 

 

 

EndIf

29.

 

 

 

EndIf

30.

 

 

EndIf

31.

 

EndIf

32.

EndIf

33.

Stop

 

tYPES OF Binary Trees

There are several types of binary trees possible each with its own properties. Few important and frequently used trees are listed as below.

       1.   Binary search tree

       2.   Heap tree

       3.   Height balanced tree (also known as AVL tree

       4.   Huffman tree

Binary Search Tree

A binary tree T is termed as binary search tree (or binary sorted tree) if each node N of T satisfies the following property:

        The value at N is greater than every value in the left sub-tree of N and is less than every value in the right sub-tree of N.

        Figure 8.28 shows two binary search trees for two different types of data.

Observe that in Figure 8.28(b), the lexicographical ordering is taken among the data whereas in Figure 8.31(a), numerical ordering is taken.

        Now, let us discuss the possible operations on any binary search tree. Operations those are generally encountered to manipulate this data structures are:

  Ÿ   Searching for a data

  Ÿ   Inserting any data into it

  Ÿ   Deleting any data from it

Ÿ Traversing the tree.

 

(a) A Binary search tree with numeric data

 

 

(b) A Binary search tree with alphabetic data

 

Fig. 8.28  Binary search trees.

 

 

 

Searching in a binary search tree

Searching for a data in a binary search tree is much faster than in arrays or linked lists. This is why, in the applications where frequent searching operations are to be performed, this data structure is used to store data. In this section, we will discuss how this operation can be defined.

        Suppose, in a binary search tree T, ITEM be the item of search. We will assume that, the tree is represented using linked structure.

        We start from the root node R. Then, if ITEM is less than the value in the root node R, we proceed to its left child; if ITEM is greater than the value in the node R, we proceed to its right child. The process will be continued till the ITEM is not found or we reach to a dead end, that is, leave node. Figure 8.29 shows the track (in shaded line) for searching of 54 in a binary search tree.

 

 

Fig. 8.29  Searching of 54 in a binary search tree.

 

        The algorithm for searching an element in a binary search tree is stated as below:

 

Algorithm Search_BST

Input:  ITEM is the data that has to be searched.

Output:  If found then pointer to the node containing as data ITEM else a message.

Data structure:  Linked structure of the binary tree. Pointer to the root node is ROOT.

Steps:

1.

ptr = ROOT, flag = FALSE                                                                                              // Start from root

2.

While (ptr ¹ NULL) and (flag = FALSE) do

3.

 

Case: ITEM < ptrDATA                                                                         // Go to the left sub-tree

4.

 

 

ptr = ptrLCHILD

5.

 

Case: ptrDATA = ITEM                                                                           // Search is successful

6.

 

 

flag = TRUE

7.

 

Case: ITEM > ptr DATA                                                                     // Go to the right sub-tree

8.

 

 

ptr = ptrRCHILD

9.

 

EndCase

10.

EndWhile

11.

If (flag = TRUE) then                                                                                          // Search is successful

12.

 

Print “ITEM has found at the node”, ptr

13.

Else

14.

 

Print “ITEM does not exist: Search is unsuccessful”

15.

EndIf

16.

Stop

 

 

Inserting into a binary search tree

Insertion operation on a binary search tree is conceptually very simple. It is in fact, one step more than the searching operation. To insert a node with data, say ITEM, into a tree, the tree is to be searched starting from the root node. If ITEM is found, do nothing, otherwise ITEM is to be inserted at the dead end where search halts. Figure 7.33 shows the insertion of 5 into a binary tree. Here, search proceeds starting from root node as 6–2–4 then halts when it finds that right child is null (dead end). This simply means that if 5 occur, then it should occurred in the right part of the node 4. So, 5 should be inserted as the right child of 4.

 

                 (a) Before insertion                                                (b) Search find the location where 5 should

                                                                                                         be inserted

 

Fig. 8.30 Inserting 5 into a binary search tree.

 

        Algorithm for performing the insertion operation is stated as below:

 

Algorithm Insert_BST

Input:  ITEM is the data component of a node that has to be inserted.

Output:  If there is no node having data ITEM, it is inserted into the tree else a message.

Data structure:  Linked structure of binary tree. Pointer to the root node is ROOT.

Steps:

1.

ptr = ROOT, flag = FALSE                                                                                // Start from root node

2.

While (ptr ¹ NULL) and (flag = FALSE) do

3.

 

Case: ITEM < ptrDATA                                                                   // Go to the left sub-tree

4.

 

 

ptr1 = ptr

5.

 

 

ptr = ptrLCHILD

6.

 

Case: ITEM > ptrDATA                                                                 // Go to the right sub-tree

7.

 

 

ptr1 = ptr

8.

 

 

ptr = ptrRCHILD

9.

 

Case: ptrDATA = ITEM                                                                                    // Node exists

10.

 

 

flag = TRUE

11.

 

 

Print “ITEM already exists”

12.

 

 

Exit                                                                                                   // Quit the execution

13.

 

EndCase

14.

EndWhile

16.

If (ptr = NULL) then                                                                // Insert when search halts at dead end

17.

 

new = GetNode(NODE)

18.

 

newDATA = ITEM                                                         // Avail a node and then initialize it

19.

 

newLCHILD = NULL

20.

 

newRCHILD = NULL

21.

 

If (ptr1DATA < ITEM) then                                                          // Insert as the right child

22.

 

 

ptr1RCHILD = new

23.

 

Else

24.

 

 

ptr1LCHILD = new                                                                // Insert as the left child

25.

 

EndIf

26.

EndIf

27.

Stop

            

Deleting a node from a binary search tree

Another frequently used operations on a binary search tree is to delete any node from it. This operation, however, is slightly complicated than the previous two operations discussed.

        Suppose T is a binary search tree, and ITEM is the information given which has to be deleted from T if it exists in the tree. Suppose N be the node which contains the information ITEM. Let us assume PARENT(N) denotes the parent node of N and SUCC(N) denotes the inorder successor of the node N (inorder successor means the node which comes after N during the inorder traversal of T). Then the deletion of the node N depends on the number of its children. Hence, three cases may arise:

        Case 1:  N is the leave node

        Case 2:  N has exactly one child

        Case 3:  N has two childs

Deletion of N then can be carried out on the various situations as illustrated in Figure 8.31 and also stated below:

 

 

Fig. 8.31  Three cases for deleting a node N.

 

 

Case 1:  N is deleted from T by simply setting the pointer of N in the parent node PARENT(N) by NULL value. See Figure 8.32(a).

Case 2:  N is deleted from T by simply replacing the pointer of N in PARENT(N) by the pointer of the only child of N. See the Figure 8.32(b).

Case 3:  N is deleted from T by first deleting SUCC(N) from T (by using Case 1 or Case 2; (it can be verified that SUCC(N) never has a left child) and then replace the data content in node N by the data content in node SUCC(N). See Figure 8.32(c).

 

1.     Deletion of the node 27

 

 

2.     Deletion of the node 45

 

 

(c) Deletion of the node 20

Fig. 8.32  Various cases of deletion in a binary search tree.

 

Now let us describe the algorithm to delete a node from a binary search tree.

 

Algorithm Delete_BST

Input:  ITEM is the data of the node to be deleted.

Output:  If the node with data as ITEM exist it is deleted else a message.

Data structure:  Linked structure of binary tree. Pointer to the root node is ROOT.

Steps:

1.

ptr = ROOT, flag = FALSE

2.

While (ptr ¹ NULL) and (flag = FALSE) do                           // Step to find the location of the node

3.

 

Case: ITEM < ptrDATA

4.

 

 

parent = ptr

5.

 

 

ptr = ptrLCHILD

6.

 

Case: ITEM > ptrDATA

7.

 

 

parent = ptr

8.

 

 

ptr = ptrRCHILD

9.

 

Case: ptrDATA = ITEM

10.

 

 

flag = TRUE

11.

 

EndCase

12.

EndWhile

13.

If (flag = FALSE) then                                                                       // When the node does not exist

14.

 

Print “ITEM does not exist: No deletion”

15.

 

Exit                                                                                                               // Quit the execution

16.

EndIf

 

/* DECIDE THE CASE OF DELETION */

17.

If (ptrLCHILD = NULL) and (ptrRCHILD = NULL) then                         // Node has no child

18.

 

case = 1

19.

Else

20.

 

If (ptrLCHILD ¹ NULL) and (ptrRCHILD ¹ NULL) then

                                                                                                   // Node contains both the child

21.

 

 

case = 3

22.

 

Else                                                                                           // Node contains only one child

23.

 

 

case = 2

24.

 

EndIf

25.

EndIf

 

/* DELETION: CASE 1 */

26.

If (case = 1) then

27.

 

If (parentLCHILD = ptr) then                                                      // If the node is a left child

28

 

 

parentLCHILD = NULL                                                       // Set pointer of its parent

29.

 

Else

30.

 

 

parentRCHILD = NULL

31.

 

EndIf

32.

 

ReturnNode(ptr)                                                     // Return deleted node to the memory bank

33.

EndIf

 

/* DELETION: CASE 2 */

34.

If (case = 2) then                                                                 // When the node contains only one child

35.

 

If (parentLCHILD = ptr) then                                                      // If the node is a left child

36.

 

 

If (ptrLCHILD = NULL) then

37.

 

 

 

parentLCHILD = ptrRCHILD

38.

 

 

Else

39.

 

 

 

parentLCHILD = ptrLCHILD

40.

 

 

EndIf

41.

 

Else

42.

 

 

If (parentRCHILD = ptr) then

43.

 

 

 

If (ptrLCHILD = NULL) then

44.

 

 

 

 

parentRCHILD = ptrRCHILD

45.

 

 

 

Else

46.

 

 

 

 

parentRCHILD = ptrLCHILD

47.

 

 

 

EndIf

48.

 

 

EndIf

49.

 

EndIf

50.

 

ReturmnNode(ptr)                                                   // Return deleted node to the memory bank

51.

EndIf

 

/* DELETION: CASE 3 */

52.

If (case = 3)                                                                                               // When contains both child

53.

 

ptr1 = SUCC(ptr)                                                       // Find the in order successor of the node

54.

 

item1 = ptr1DATA

55.

 

Delete_BST (item1)                                                                    // Delete the inorder successor

56.

 

ptrDATA = item1                             // Replace the data with the data of in order successor

57.

EndIf

58.

Stop

 

Note that, we assume the function SUCC(ptr) which returns pointer to the inorder successor of the node ptr. It can be verified that inorder successor of ptr always occurs in the right sub-tree of ptr, and inorder successor of ptr does not have left child. Figure 8.33 shows an instance.

 

                                                           

 

Fig. 8.33  Inorder successor of a node ptr.

 

        Finding the inorder successor of any node is very easy and is stated as below:

 

Algorithm SUCC

Input:  Pointer to a node PTR whose inorder successor is to be known.

Output:  Pointer to the inorder successor of ptr.

Data structure:  Linked structure of binary tree.

Steps:

1.

ptr1 = PTRRCHILD                                                                                     // Move to the right sub-tree

2.

If (ptr1 ¹ NULL) then                                                                                  // If right sub-tree is not empty

3.

 

While (ptr1LCHILD ¹ NULL) do                                              // Move to the left-most end

4.

 

 

ptr1 = ptr1LCHILD

5.

 

EndWhile

6.

EndIf

7.

Return(ptr1)                                                                     // Return the pointer of the inorder successor

8.

Stop

Traversals on binary search tree

All the traversal operations are applicable in binary search trees without any alteration. It can be verified that inorder traversal on a binary search tree will give the sorted order of data in ascending order. If we require to sort a set of data, a binary search tree can be built with those data and then inorder traversal can be applied. This method of sorting is known as binary sort and this is why binary search tree is also termed as binary sorted tree. This sorting method is considered as one of the efficient sorting method.

Application of binary search trees

Binary search tree is one of the most important data structure and it has a large number of applications. It can be used in sorting method and in searching method. In fact, binary search tree is used to define other data structure, for example, B tree. Here, we shall learn an example of it in the context of searching when the key values are not necessarily equally likely. In other words, there are several applications where key values are accessed according to some frequency values. One such application, suppose, writing a translator from English to Hindi. Here, for each occurrence of each English word in the input text we are to look up its Hindi equivalent. One simple solution to this problem is to construct a binary search tree for all words involved in the English vocabulary. With this solution it may be the case that a frequently used word such as “the” appears far from the root while a rarely used word such as “neanderthal” appears near the root. This is obviously not a good a way of organizing the binary search tree because it effectively slows down the translation process. This consideration makes it natural to ask about the best possible tree for searching a table of keys with given frequencies.

      

Heap Trees

Suppose, H is a complete binary tree. Then it will be termed as heap tree, if it satisfies the following properties:

          (i)   For each node N in H, the value at N is greater than or equal to the value of each of the children of N.

         (ii)   Or in other words, N has the value which is greater than or equal to the value of every successor of N.

Such a heap tree is called max heap. Similarly, min heap is possible, where any node N has the value less than or equal to the value of any of the successors of N. Figure 8.34 shows max heap and min heap with numeric data:

 

 

Fig. 8.34  Heap trees.

 

Note that in a max heap, the root node contains the largest data whereas in a min heap it contains the smallest data.

 

Representation of heap tree

A heap tree can be represented using linked structure. But single array representation has certain advantages for a heap tree over its linked representation. As heap tree is a complete binary tree, so there will be no wastage of array space between two non-null entries; if any null entries, these are only at the tail of the array. Another facility is that we do not have to maintain any links of descendants (child), here, these are automatically implied, to access a child only a few simple arithmetic computations are involved. The major advantage with this representation is that, from a node we can go in both directions, towards its ancestors and towards its successor. This is, although possible in linked structure but is a matter of maintenance of extra link field.

        Hence, for a heap tree, it is better to consider the single array representation. Figure 8.35 shows a heap tree (as represented in Figure 8.34(b)) using a single array.

 

 

Fig. 8.35  Single array representation of a heap (min) tree.

 

During the subsequent discussions on heap tree we will assume the single array representation unless we specify the other.

 

 

Operations on heap tree

The major operations required to be performed on a heap tree are: insertion, deletion and merging. Let us discuss all these operations one by one.

Insertion into a heap tree

This operation is used to insert a node into an existing heap tree satisfying the properties of heap tree. Repeated insertions of data, starting from an empty heap tree, one can build up a heap tree. Let us first discuss the method of insertion, then formal description of the procedure will be described.

        Let us consider the heap (max) tree as shown in Figure 8.36. We want to insert 19 and 111 in it. The principle of insertion is that, first we have to adjoin the data in the complete binary tree. Next, we have to compare it with the data in its parent; if the value is greater than that at parent then interchange the values. This will continue between two nodes on path from the newly inserted node to the root node till we get a parent whose value is greater than its child or we reached at the root.

        For illustration, 19 is added as the left child of 25, Figure 7.36(b). Its value is compared with its parent’s value, and to be a max heap, parent’s value > child’s value is satisfied, hence interchange as well as further comparison is no more required.

        As an another illustration, let us consider the case of inserting 111 into the resultant heap tree. First, 111 will be added as right child of 25, when 111 is compared with 25 it requires interchange. Next, 111 is compared with 85, another interchange takes place. At the last, 111 is compared with 95 and then another interchange occurs. Now, our process stops here, as 111 is now in root node. The path on which these comparisons and interchanges have taken places are shown by dashed line, Figure 8.36(c).

 

 

 

 

Fig. 8.36  Insertion of 19 and 111 into a heap tree.

 

 

        Now, we are in a position to describe the algorithm InsertMaxHeap to insert a data into a max heap tree. Let us assume that A[1...SIZE] be an array of data where the heap tree is stored and its maximum capacity is SIZE. N denotes the number of nodes currently in the heap tree. For an empty heap tree, N = 0.

 

Algorithm InsertMaxHeap

Input:  ITEM, the data to be inserted; N, the strength of nodes.

Output:  ITEM is inserted into the heap tree.

Data structure:  Array A[1...SIZE] stores the heap tree; N being the number of node in the tree.

Steps:

1.

If (N ³ SIZE) then

2.

 

Print “Heap tree is saturated: Insertion is void”

3.

 

Exit

4.

Else

5.

 

N = N + 1

6.

 

A[N] = ITEM                                                         // Adjoin the data in the complete binary tree

7.

 

i = N                                                                                              // Last node is the current node

8.

 

p = i div 2.                                                                                               // Its parent node is at p

9.

 

While (p > 0) and (A[p] < A[i]) do                // Continue till the root is reached or out of order

10.

 

 

temp = A[i]

11.

 

 

A[i] = A[p]

12.

 

 

A[p] = temp

13.

 

 

i = p                                                                                               // Parent becomes child

14.

 

 

p = p div 2                                                            // Parent of parent becomes new parent

15.

 

EndWhile

16.

EndIf

17.

Stop

 

Deletion of a node from heap tree

Any node can be deleted from a heap tree. But from the application point of view, deleting the root node has some special importance. We will first illustrate the principle of such deletion and then the formal description. The principle can be stated as below:

         ·   Read the root node into a temporary storage say, ITEM.

         ·   Replace the root node by the last node in the heap tree. Then reheap the tree as stated below:

                 Let newly modified root node be the current node. Compare its value with the value of its two child. Let X be the child whose value is the largest. Interchange the value of X with the value of the current node.

                 Make X as the current node.

                 Continue reheap, if the current node is not an empty node.

        This is illustrated as in Figure 8.37. In Fig. 8.37, the root node is 99. The last node is 26, and it is in the level 3. So, 99 is replaced by 26 and this node with data 26 is removed from the tree. Next 26 at root node is compared with its two child 45 and 63. As 63 is greater, so they are interchanged. Now, 26 is compared with its children, namely, 57 and 42, as 57 is greater, so they are interchanged. Now, 26 appears as the leave node, hence reheap is completed.

 

 

 

Fig. 8.37  Deletion of the root node 99 from a max heap tree.

 

The algorithm for the deletion of a node from a heap (max) tree is is given below:

 

Algorithm MaxHeapdelete

Input:  A heap tree with elements.

Output:  ITEM is being the data deleted and the remaining tree after deletion.

Data structures:  An array A[1... SIZE] storing the heap tree; N is the number of nodes.

Steps:

1.

If (N = 0) then

2.

 

Print “Heap tree is exhausted: Deletion is not possible”

3.

 

Exit

4.

EndIf

5.

ITEM = A[1]                                                                           // Value at the root node under deletion

6.

A[1] = A[N]                            // Replace the value at root by its counterpart at last node on last level

7.

N = N – 1                                                                                 // Size of the heap tree is reduced by 1

8.

flag = FALSE, i = 1

9.

While(flag = FALSE) and (i < N) do                                                              // Rebuild the heap tree

10.

 

lchild = 2 * i, rchild = 2 * i+1               // Addresses of left and right child of the current node

11.

 

If (lchild £ N) then

12.

 

 

 x = A[lchild]

13.

 

Else

14.

 

 

x = – ¥

15.

 

EndIf

16.

 

If (rchild £ N) then

17.

 

 

y = A[rchild]

18.

 

Else

19.

 

 

y = – ¥

20.

 

EndIf

21.

 

If (A[i] > x) and (A[i] > y) then                                             // If parent is larger than its child

22.

 

 

flag = TRUE                                                                                         // Reheap is over

23.

 

Else                                                                                       // Any child may have large value

24.

 

 

If (x > y) and (A[i] < x)                                        // If left child is larger than right child

25.

 

 

 

Swap(A[i], A[lchild])         // Interchange the data between parent and left child

26.

 

 

 

 i = lchild                                                   // Left child becomes the current node

27.

 

 

Else

28.

 

 

 

If (y > x) and (A[i] < y)                             // If right child is larger than left child

29.

 

 

 

 

Swap(A[i], A[rchild])     //Interchange the data between parent and right child

30.

 

 

 

 

i = rchild                                       // Right child becomes the current node

31.

 

 

 

EndIf

32.

 

 

EndIf

33.

 

EndIf

34.

EndWhile

35.

Stop

               

Merging two heap trees

Consider, two heap trees H1 and H2. Merging the tree H2 with H1 means to include all the nodes from H2 to H1. H2 may be min heap or max heap and the resultant tree will be min heap if H1 is min heap else it will be max heap.

        Merging operation consists of two steps: Continue steps 1 and 2 while H2 is not empty:

        1. Delete the root node, say x, from H2.

        2. Insert the node x into H1 satisfying the property of H1.

Figure 8.38 illustrates the merge operation between two heap trees.

 

 

Fig. 8.38  Merging two heap trees.

 

 

 

Application of heap trees

There are two main applications of heap trees known: (a) sorting and (b) priority queue implementation. The sorting method which is based on heap tree is called heap sort and known as the efficient sorting method. We have mentioned the problem of implementing priority queues using linked structure; using heap tree there is an elegant solution possible for the same problem. Let us discuss briefly about each application.

 

 

Sorting using heap tree

Any kind of data can be sorted either in ascending order or in descending order using heap tree. This actually comprises of the following steps:

        Step 1:      Build a heap tree with the given set of data.

        Step 2:      (a) Delete the root node from the heap.

                           (b) Rebuild the heap after the deletion.

                           (c) Place the deleted node in the output.

        Step 3:      Continue Step 2 until the heap tree is empty.

It can be noted that, to sort the data in ascending (descending) order, we have to built max heap (min heap) in Step 1. In order to trace the method, let us assume the case of sorting the following set of data in ascending order:

                                                    33, 14, 65, 02, 76, 69, 59, 85, 47, 99, 98

As per Step 1, first we have to built the max heap tree which can be done as below:  Successively remove an element from the list and insert it into the heap starting from an empty heap. This results the heap tree as shown in Figure 8.39(a).

        We do not require another array to store the output, instead, in the same array where the heap is stored. Following Step 2, let us see, how the sorted array will grow gradually.

        99 is deleted and it is replaced by 76, the last node in the heap; the currently deleted node is placed in the position of 76, the last node. Now, the last node is 14. This is simply swapping the root node and the last node and shifting pointer to the last node just once towards the left. This is illustrated in Figure 8.39(b).

        Next, we are to rebuild the heap tree. This is as illustrated as in Figure 8.39(c)–(d).

 

 

(a)   A Heap tree from a given set of data

 

 

 

(b)   Swaping the root and the last node

 

 

(c)    Rebuild the heap tree

 

 

(d)   Continuation of step 2 after selecting 98

 

 

 

(e)   Sorted list when the heap is empty

 

Fig. 8.39(a)–(e)  Tracing of heap sort.

 

        The algorithm HeapSort is presented below to implement the sorting method.

 

Algorithm HeapSort

Input:  A set of N input data.

Output:  Sorted data in ascending order.

Data structure:  An array A where the data will be stored.

Steps:

1.

BuildMaxHeap(A)                                                         // To create the heap H from the set of data

2.

i = N

3.

While (i > 1) do                                                          // Continue until heap contains single element

4.

 

Swap (A[1], A[i])                                  // Delete the root node and replace it by the last node

5.

 

i = i – 1                                                         // Pointer to the last node shifted towards the left

6.

 

j = 1                                                                                                         // To rebuild the heap

7.

 

While (j < i) do

8.

 

 

lchild = 2 * j                                                                                                // Left child

9.

 

 

rchild = 2 * j + 1                                                                                              // Right child

10.

 

 

If (A[j] < A[lchild]) and (A[lchild] > A[rchild]) then

11.

 

 

 

Swap(A[j], A[lchild])

12.

 

 

 

j = lchild

13.

 

 

Else

14.

 

 

 

If (A[j] < A[rchild]) and (A[rchild] > A[lchild]) then

15.

 

 

 

 

Swap(A[j], A[rchild])

16.

 

 

 

 

j = rchild

17.

 

 

 

Else

18.

 

 

 

 

Break( )                                     // Rebuild is completed: break the loop

19.

 

 

 

EndIf

20.

 

 

EndIf

21.

 

EndWhile

22.

EndWhile

23.

Stop

 

 

In the above algorithm, we assume a procedure BuildMaxHeap to build a heap from a given set of data, which is nothing but the repeated insertions into the heap starting from an empty heap using the procedure MaxHeapInsert. Other procedure, like Swap(...), bears the usual meaning.

 

Priority queue implementation using heap tree

A priority queue can be implemented using circular array, linked list, etc. Another simplified implementation is possible using heap tree; the heap, however, can be represented using an array. This implementation is therefore free from the complexities of circular array and linked list but getting the advantages of simplicities of array.

        It may be recalled that, heap trees allow the duplicity of data in it. Elements associated
with their priority values are to be stored in the form of heap tree, which can be formed based on their priority values. The top priority element that has to be processed first is at the root; so it can be deleted and heap can be rebuilt to get the next element to be processed, and so on.

 

        As an illustration, consider the following processes with their priorities:

        Process               P1        P2       P3       P4       P5       P6       P7       P8        P9      P10

        Priority               5         4        3         4        5         5        3        2         1        5

 

These processes enter the system in the order as listed above at time 0, say. Assume that a process having higher priority value will be serviced first. The heap tree that can be formed considering the process’ priority values is shown in Figure 8.40(a).

        The order of servicing the processes are successive deletion of roots from the heap as illustrated in Figures 8.40(b)–(j).

 

 

        Refer to Figures 8.40(b), (c), (e), we note that out of two processes, having the same priority and in the same level, that process climbed to the root which precedes the other in their order of arrival. So, the processes are served in the sequence as

 

                                                 P1    P5    P6    P10    P2    P4    P3    P7    P8    P9

 

 

Fig. 8.40  Priority queue implementation using a heap tree.

Height Balanced Binary Tree

Let us consider the following set of data:

                                        jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec

We know that for a given set of data binary search tree is not necessarily unique. In other words, structure of a binary search tree depends on the ordering of data in the input. Hence, for the above mentioned data there are 12! binary search trees possible (each corresponding to an arrangement in the permutation of data). Only six of them are presented as shown in Figure 8.41.

        Next, let us define the average search time G, for an element in a binary search tree as

                                                                                                                                                

where

        ti = Number of comparisons for the i-th element

        n = Total number of elements in the binary search tree

        Thus with this, the average search time G for the binary search trees as mentioned in
Figure 7.52 can be calculated as:

                                                                    G(a) = 6.50    G(b) = 4.00

                                                                    G(c) = 3.50    G(d) = 3.08

                                                                    G(e) = 3.08    G(f) = 3.16

 

Figure 8.41(a)–(f)  Six examples of binary search trees from a given set of data.

        From this calculation, it is evident that, out of six varieties of representations, last three representations are efficient from the searching time point of view. The worst representation is the skewed form of the binary search tree, which needs the highest average search time.

        Now, here the question arises is that for a given set of data how a binary search tree can be constructed so that it will involve minimum average search time. The answer lies in the concept of height balanced binary search tree. A binary search tree can be made by means of calculating the balance factor of each node. We first define the term balance factor first. The balance factor of a binary tree (bf) is defined as

                            bf = Height of the left sub-tree (hL) – Height of the right sub-tree (hR)                                         

Height (h) of a binary tree as mentioned earlier is

        h = Number of nodes visited in traversing a branch which leads to a leaf node at the deepest level of the tree.

Definition.  A binary search tree is said to be height balanced binary search tree if all its nodes have a balance factor of 1, 0 or –1. That is,

                                                                                         |bf| = |hLhR| £ 1                                                                                             

for every node in the tree.

        For example, Figure 8.42 represents two binary search trees out of which one satisfies the properties of height balanced binary tree and other does not.

 

                        

 

Fig. 8.42  Two binary search trees with the balance factors of each node.

 

        It can be noted that a height balanced binary tree is always a binary search tree and a complete binary search tree is always height balanced, but the reverse may not be true.

        The basic objective of the height balanced binary search tree is to perform the searching, insertion and deletion operations efficiently. These operations may not be with the minimum time but less than that of in unbalanced binary search tree. It can be realized that the search time in case of complete binary search tree is minimum but insertion and deletion may not be with minimum time always.

        In the following discussion, we will see how an unbalanced binary search tree can be converted into a height balanced binary search tree. Suppose, initially there is a height balanced binary tree. Whenever a new node is inserted into it (or deleted from it), it may become unbalanced. We will first see the mechanism to balance an unbalance tree due to insertion of a node.

        Following steps are to be adopted.

          1.   Insert node into a binary search tree:  Insert the node into its proper position following the properties of binary search tree.

          2.   Compute the balance factors:  On the path starting from the root node to the node newly inserted, compute the balance factors of each node. It can be verified that change in balance factors will occur only in this path.

          3.   Decide the pivot node:  On the path as traced in Step 2, determine whether the absolute value of any node’s balance factor is switched from 1 to 2. If so, the tree becomes unbalanced. The node whose absolute value of balance factor is switched from 1 to 2, mark it as a special node called pivot node. (There may be more than one node whose balance factor |bf| is switched from 1 to 2 but the nearest node to newly inserted node will be the pivot node.)

          4.   Balance the unbalance tree:  It is necessary to manipulate pointers centred at the pivot node to bring the tree back into height balance. This pointer manipulation is well known as AVL rotation, mechanism of which is illustrated next in the text.

AVL rotations

In order to balance a tree, one elegant method devised in 1962 by two Russian mathematicians, G.M. Adelson-Velskii and E.M. Lendis, and the method is named as AVL rotation in their honour.

        There are four cases of rotations possible which is discussed as below:

Case 1:  Unbalance occurred due to the insertion in the left sub-tree of the left child of the pivot node.

        Figure 8.43 illustrates the rotation for Case 1. In this case, following manipulations in pointers take place:

           ·   Right sub-tree (AR) of left child (A) of pivot node (P) becomes the left sub-tree of P.

           ·   P becomes the right child of A.

           ·   Left sub-tree (AL) of A remains the same.

This case is called LEFT-TO-LEFT insertion.

 

 

Fig. 8.43  AVL rotation when unbalance occurs due to the insertion in the left sub-tree of the

                             left child of the pivot node (Left-to-left insertion).

 

Illustration.  Consider the illustration as shown in Figure 8.44. Node 2 is inserted into the initially heright balances tree (see Figure 8.44(a)). Here, 6 is the pivot node because this is the nearest node from the newly inserted node whose balance factor is switched from 1 to 2 (see Figure 8.44(b)). This insertion corresponds to the case of LEFT-TO-LEFT insertion. AVL rotation required is depicted in Figures 8.44(c)–(d). Final height balanced tree is shown as in Figure 8.44(e) with balance factor of each node.

Fig. 8.44(a)–(e)  LEFT-to-Left insertion and AVL rotation to make the tree height balanced.

 

Case 2:  Unbalance occurs due to the insertion in the right sub-tree of the right child of the pivot node. This case is the reverse and symmetric to Case 1. This rotation is illustrated in Figure 8.42. In this case, the following manipulations in pointers take place:

           ·   Left sub-tree (BL) of right child (B) of pivot node (P) becomes the right sub-tree of P.

           ·   P become the left child of B.

           ·   Right sub-tree (BR) of B remains same.

This case is known as RIGHT-TO-RIGHT insertion.

 

Fig. 8.42  AVL rotation when unbalance occurs due to insertion in the right sub-tree of the right child of the pivot node (RIGHT-TO-RIGHT insertion).

 

Illustration.  Case 2 of AVL rotation occurs due to RIGHT-TO-RIGHT insertion. A situation is illustrated in Figure 8.43. Due to the insertion of 12, the tree becomes unbalance (see Figure 8.43(a)) and the pivot node in this case is 9. The unbalance occurs due to the insertion as the right child of the right sub-tree of the pivot node. The rotation required is illustrated in Figure 8.43(b).

Fig. 8.43  RIGHT-TO-RIGHT insertion and corresponding AVL rotation.

 

Case 3:   Unbalance occurs due to the insertion in the right sub-tree of the left child of the pivot node. Figure 8.44 illustrates the rotation required for this case of insertion. This case is known as LEFT-TO-RIGHT insertion.

Case 3 involves two rotations for the manipulation in pointers:

        Rotation 1

           ·   Left sub-tree (BL) of the right child (B) of the left child of the pivot node (P) becomes the right sub-tree of the left child (A).

           ·   Left child (A) of the pivot node (P) becomes the left child of B.

        Rotation 2

           ·   Right sub-tree (BR) of the right child (B) of the left child (A) of the PIVOT node (P) becomes the left sub-tree of P.

           ·   P becomes the right child of B.

        This case is known as LEFT-TO-RIGHT insertion. If insertion occurs at BR instead of BL, it corresponds to the same as Case 3.

Fig. 8.44  AVL rotation when unbalance occurs due to the insertion in the right sub-tree of the left child of the pivot node (LEFT-TO-Right insertion).

 

Illustration.  Consider the illustration as shown in Figure 8.45 for LEFT-TO-RIGHT insertion and corresponding AVL rotation.

 

Figure 8.45(a)–(d)  LEFT-TO-RIGHT insertion and AVL rotation.

Case 4:  Unbalance occurs due to the insertion in the left sub-tree of right child of the pivot node. This case is the reverse and symmetric to Case 3. This case is known as RIGHT-TO-LEFT insertion. The rotation requires balancing the tree in this situation as illustrated in Figure 8.46.

Fig. 8.46  AVL rotation when unbalance occurs due to the insertion at the left sub-tree of right child of the pivot node (RIGHT-TO-LEFT insertion).

 

There are two rotations for the manipulations of pointers in this case, these are:

        Rotation 1

           ·   Right sub-tree (BR) of the left child (B) of the right child (A) of the pivot node (P) becomes the left sub-tree of A.

           ·   Right child (A) of the pivot node (P) becomes the right child of B.

        Rotation 2

           ·   Left sub-tree (BL) of the right child (B) of the right child (A) of the pivot rode (P) becomes the right sub-tree of P.

           ·   P becomes the left child of B.

        This case is known as RIGHT-TO-LEFT insertion. We have considered the insertion in BR; the AVL rotation is same, if the node is inserted in BL.

Illustration.  Figure 8.47 illustrates the Case 4 of AVL rotation required during RIGHT-TO-LEFT insertion.

 

 

Fig. 8.47(a)–(c)  RIGHT-TO-LEFT insertion and AVL rotation.

                                                                                                                                                                               

Height of a height balanced binary tree

We have learned how to create a height balanced binary (AVL) tree. Let us now analyze on the height of a height balanced binary tree. Let us calculate the maximum height that an AVL tree may have. To determine the maximum height that an AVL tree with n nodes have, we first calculate the minimum number of nodes that an AVL tree of height h can have.

       

It can be proved that the maximum height possible in an AVL tree with n number of nodes is given by

 

                                                                                                       

 

Weighted Binary Tree

Before discussing about weighted binary tree, let us first introduce the following terms related to the weighted binary tree:

Path length.  Path length of a node in a tree is defined as the number of edges that has to be traversed from the node to the root node.

External node.   The node with zero children is called an external node (or in other words, an external node is nothing but a leaf node).

Internal node.  A node with one or more children is called an internal node.

Refer Figure 8.48, all the internal nodes are drawn as circles and all the external nodes are drawn as squares. Thus, nodes labelled as N1, N2, ..., N9 are internal nodes and that of L1, L2, ..., L10 are external nodes. A binary tree containing such distinctive nodes as external nodes, is known as extended binary tree.

It is already discussed in Section 7.2.2 vide Lemma 8.5 that if NE and NI denote the number of external and internal nodes then NE = NI + 1. It can be verified from Figure 8.48, where NE = 10, NI = 9, and thus NE = NI + 1.

 

 

 

Fig. 8.48  An extended binary tree.

 

        Path length of a node as defined can be verified from Figure 8.48:

                                                                      Path length of N1 = 0

                                                                      Path length of N9 = 4

                                                                      Path length of L5 = 4

                                                                      Path length of L8 = 5

and so on. Now, we will define additional two terms: external path length and internal path length.

External path length.  External path length (let it be denoted as E) of a binary tree is defined as the sum of all path lengths, summed over each path from the root node of the tree to an external node. Alternatively,

                                                                              

where, Pi denotes the path length of the i-th external node.

Internal path length.  Internal path length (let it be denoted as I) of a binary tree can be defined analogously as the sum of path lengths of all internal nodes in the tree. Alternatively,

                                                                              

where, Pi denotes the path length of the i-th internal node.

        Refer Figure 8.48, whose external path length and internal path length can be computed as

                                          L1   L2    L3     L4     L5    L6   L7   L8    L9   L10

                                    E = 2 + 3 + 3 + 4 + 4 + 3 + 3 + 5 + 5 + 4 = 36

 

                                           N1  N2   N3   N4   N5   N6    N7   N8   N9

                                    I = 0 + 1 + 2 + 1 + 2 + 3 + 2 + 3 + 4 = 18

        There is a relation between the external path length (E) and internal path length (I), which is stated in Lemma 8.8:

 

Lemma 8.8

 

In a binary tree with n internal nodes, if I denotes the internal path length, then external path length, E = I + 2n.

Proof :      This lemma can be proved by the method of induction and is left as an exercise.

        The above lemma can be verified from the extended tree as presented in Figure 7.68, where n = 9, E = 36 and I = 18.

        Now, let us discuss, for a given number of internal node, say n, what is the minimum and maximum external path lengths possible. It is evident from Lemma 7.13 that binary trees with minimum I also have the minimum E.

        It can be easily realized that, a binary tree is having maximum internal path lengths when it is skewed and minimum when all the internal nodes are as close to the root as possible, that is, when it is a complete binary tree (Figure 8.49).

 

        If Imax denotes the maximum internal path length of a binary tree with n internal nodes, then

                                                                                                                           

        Similarly, if Imin denotes the minimum internal path length of a binary tree with n internal nodes, then

                                                                                                                   

where lmax is the maximum level.

 

 

Fig. 8.49  Two binary trees with minimum and maximum path lengths.

Weighted path length

So far we have discussed about the extended binary trees where the values of internal and external nodes are same and is one unit, say. But the path lengths considering different values of nodes are possible. We will restrict our discussion in case of binary trees where all internal nodes have unit values but each of the external nodes are assigned a (non-negative) number called weights. The external path length with these weights are called external weighted path length (or, simply weighted path length) and the corresponding binary tree is termed as extended weighted binary tree (or, simply weighted binary tree).

        Hence, if Wi is a weight of an external node ni and its path length is Li then, weighted path length, P is:

                                                                                                                                               (7.30)

for a binary tree with n external nodes.

        Figure 8.50 illustrates a weighted binary tree.

 

 

Fig. 8.50  A weighted binary tree with 7 external nodes.

 

        The weighted external path length (P) of the tree in Figure 8.50 is calculated as below:

                                          P = W1L1+ W2L2 + W3L3 + W4L4 + W5L5 + W6L6 + W7L7

                                       = 2 ´ 2 + 5 ´ 2 + 3 ´ 4 + 4 ´ 4 + 1 ´ 3 + 7 ´ 3 + 6 ´ 3

                                       = 4 + 10 + 12 + 16 + 3 + 21 + 18

                                       = 84

        Our next objective is to construct a weighted binary tree for a given set of weights of the external nodes, so that the weighted path length is minimum. As earlier, we have noticed that path lengths are minimum when the extended tree is a complete binary tree, it is not true always in case of weighted binary trees. For an illustration, let us consider four external nodes having weights 2, 3, 5 and 9. Three weighted binary trees are constructed as shown in Figure 8.51.

        The weighted path lengths of the trees are:   

                                                   P(T1) = 2 ´ 2 + 3 ´ 2 + 5 ´ 2 + 9 ´ 2 = 38

                                                   P(T2) = 2 ´ 1 + 9 ´ 2 + 5 ´ 3 + 3 ´ 3 = 44

                                                   P(T3) = 5 ´ 2 + 2 ´ 3 + 3 ´ 3 + 9 ´ 1 = 34

       

 

Fig. 8.51  Three weighted binary trees with 4 weights.

 

        So, here T1, which is in the form of complete binary tree is not a minimum weighted binary tree (a weighted binary tree with minimum external weighted path length may be termed as minimum weighted binary tree), where T3 is a minimum weighted binary tree.

        An elegant solution to the problem of finding a binary tree with minimum weighted external path length was given by D. Huffman. The tree that can be constructed with the Huffman’s algorithm is guaranteed to be a minimum weighted binary tree and to honour the inventor, such trees are alternatively termed as Huffman tree. Huffman’s algorithm is an iterative method, which is stated informally as below:

        Suppose, we have to construct a minimum weighted binary tree (Huffman tree) with n weights W1, W2, . . ., Wn.

         ·   Sort the weights in ascending order.

         ·       Obtain a sub-tree with two minimum weights as the weights of external nodes.

         ·   Include the weighted path length of the sub-tree so obtained into the list of weights.

         ·   Repeat the procedure until the list contains single weight.

It can be noted that, Huffman’s algorithm constructs the tree from bottom up (starting from the external nodes to the root nodes) rather than from top down.

        The algorithm is illustrated with an example as shown in the Figure 8.51. Suppose the following weights are given as input:

        External node:       A        B        C       D        E        F

        Weight:                    2        5         3        4         1         7

The final Huffman tree is shown in Figure 8.51(e). The weighted path length of the tree can easily be obtained by sum up all the values in the internal nodes. Thus, path length of the resultant tree is = 22 + 9 + 13 + 6 + 3 = 53.

 

 

Fig. 8.51 Continued.

 

Fig. 8.51  Construction of Huffman tree.

 

        There is as an alternative weighted binary tree for the same set of data which can be constructed by trial basis as shown in Figure 8.52. The weighted path length of which can be estimated as

                                                              P = 22 + 8 + 14 + 3 + 7 = 54

 

 

Fig. 8.52  A weighted binary tree but not Huffman tree.

 

        The above example is included to emphasize the fact that the Huffman tree so obtained as in Figure 8.51 is guaranteed to be a minimum weighted external binary tree. Reader can verify this by obtaining any other weighted binary tree possible.

        It can also be noted that construction of Huffman tree by Huffman’s algorithm is not unique. Figure 8.53 represents two Huffman trees obtained by following the Huffman’s algorithm for the same set of weights.

        Here, both the trees have the same weighted path length (= 53) and they are Huffman trees, that is, the external weighted binary tree with minimum external path length, although not unique.

 

 

Fig. 8.53  Two minimum weighted Huffman trees with the same set of weights.

 

                                                                                       

Implementation of Huffman tree

Now let us consider the implementation aspect of Huffman tree. We will assume the node structure as mentioned below for a node to form Huffman tree.

 

 

Here. LCHILD and RCHILD are as usual two link fields, WEIGHT is a numeric field to store the weight of a node in case of external node, and path length of sub-tree taking the current node as the root in case of internal node. DATA is to store the label for a node; for all the internal nodes, DATA component will be empty. Also, we will assume a pointer array PARRAY[. . .] to store the pointers to nodes in a Huffman tree and Sort_PARRAY to sort the PARRAY[. . .] the elements between the array locations I and N (with respect to their weights). The Algorithm BuildHuffmanTree is now given as below:

Algorithm BuildHuffmanTree

Input:  W = [W1, W2, ..., WN] is a set of weights of N external nodes.

Output:  A Huffman tree having ROOT as the pointer to its root note with the given set of data as the weights of external nodes.

Data structure:  Linked structure of Huffman tree.

 

Steps:

1.

For i = 1 to N do                                             // Read the external nodes and initialize the PARRAY

2.

 

ptr = GetNode(NODE)

3.

 

ptr→WEIGHT = W[i]                                             // To read the weight for the external node

4.

 

ptr→LCHILD = NULL                                            // Link fields of external nodes are empty

5.

 

ptr→RCHILD = NULL

6.

 

PARRAY[i] = ptr                                                                // Store the node into the PARRAY

7.

EndFor

8.

i = 1

9.

While (i < N) do

10.

 

Sort_PARRAY(i, N)                                   // Sort the weights from i to N in ascending order

11.

 

ptr = GetNode(NODE)                                                              // Get a node for internal node

12.

 

ptr→DATA = NULL

13.

 

ptr→LCHILD = PARRAY[i]                          // Include the lowest weight as the left sub-tree

14.

 

ptr→RCHILD = ptr→PARRAY[+ 1]      // Include the second lowest weight as the right sub-tree

15.

 

ptr→WEIGHT = (ptr→LCHILD) →WEIGHT + (ptr→RCHILD) →WEIGHT

                                                          // Path length of sub-tree

16.

 

PARRAYE[+ 1] = ptr                   // Store the pointer to the sub-trees into the list of weights

17.

 

i = i + 1

18.

EndWhile

19.

ROOT = PARRAY[N]                                                                              // Root of the Huffman tree

20.

Stop

 

Application of Huffman tree

Binary trees with minimum weighted external path length have applications in several areas. One very useful application is to obtain an optimal set of codes for symbols S1, S2, ..., Sn which constitute messages. Each code is a binary string (combinations of 0’s and 1’s) which will be used for transmission of messages.

        Suppose, there are n symbols to constitute a message. We are to code these symbols by means of strings of bits. One simple way to do this is to code each symbol by a r-bit string where

                                                                              2– 1 < n £ 2r                                                                             

For example, suppose a message consists of one or more of the 11 symbols, say, S1, S2,
..., Sn.

        Then the number of binary bits required to encode each symbol is 4 (since 23 < 11 £ 24) and a coding scheme is presented as in Table 8.2.

Table 8.2  Fixed Length Coding

        Symbol                         Binary code                                    Symbol                         Binary code           

           S1                                       0000                                              S7                                     0110                 

           S2                                       0001                                              S8                                     0111                 

           S3                                       0010                                              S9                                     1000                 

           S4                                       0011                                              S10                                    1001                 

           S5                                       0100                                              S11                                    1010

           S6                                       0101

        Such kind of coding is known as fixed length coding and if the message contains all the symbols with equal probabilities (that is, equal number of occurrences) then the above coding scheme is an efficient one. But in practice, we see that different symbols have different probabilities of occurrences; if we code symbols having high frequency of occurrences with fewer bits than the symbols having low frequency of occurrences, then we can save the transmission time to transmit the messages and storage space to store the messages. Thus, in contrast to the earlier scheme, that is, fixed length coding, we should prefer the variable length coding.

        A very nice approach to obtain the variable length codes for a given set of symbols is to apply Huffman coding. Huffman coding uses the Huffman tree. With this coding, the probability of occurrence of each symbols are counted first. These probability values are then assigned as weights to the symbols. Considering these symbols as external nodes, we are to construct a Huffman tree. For an illustration, Table 8.2 shows the symbols and their probabilities. To make the weights as integer values, (for our simplicity), we have multiplied the probability values by 64. The Huffman tree so obtained is shown as in Figure 8.54(a).

 

Table 8.3  Variable Length Coding

          Symbol                               Probability                                 Weight                                Code              

            S1                                           1/4                                                16                                          00            

            S2                                           3/16                                              12                                          10            

            S3                                           1/8                                                  8                                        111            

            S4                                           1/8                                                  8                                        010            

            S5                                           1/16                                                4                                      1101            

            S6                                           1/16                                                4                                    01110            

            S7                                           1/16                                                4                                    01111            

            S8                                           1/16                                                4                                      0110            

            S9                                           1/32                                                2                                    11001            

           S10                                          1/64                                                1                                 110000            

            S11                                          1/64                                               1                                    110001

 

        Huffman coding then can be obtained from the Huffman tree by tracing all the paths for each external node starting from the root node. An edge will be labelled as 0 if it is a left edge of a node and it is 1 if it is a right edge. The sequence of edges from root node to the leaf node will result a binary string, and it corresponds to the code of that symbol. Huffman coding is shown as in Figure 7.104(b) and optimal binary codes are listed in Table 8.3.

 

 

 

 

Fig. 8.54  Variable length coding using Huffman tree.

 

 

        Whenever a sender sends some message, first it encodes constituent symbols using Huffman coding. At the receiver end, the same Huffman tree is used in order to decode the receiving message.

        Huffman codes have two important properties.

        1. It is an optimal set of binary code with minimum average length. In case of the current example, average length of binary code is 4.

        2. It has the prefix property, that is, no code appears as the prefix of other code. Or, in other words, suppose, C is the set of generated binary code, and for a given y Î C, there does not exist a x Î C such that x q z = y, for some non-empty binary string z. Here q  represents the concatenation operation.

 

Advantages of Huffman coding

Potential advantages obtained from the Huffman coding are listed as below:

          1.   The Huffman coding has the minimum average length, so it involves minimum transmission time as well as less amount of storage space. For an example, suppose there is a message of 128 symbols; all the symbols are from S1, S2, ..., S11 (as listed in Tables 7.2 and 7.3). Using fixed length coding we need 128 ´ 4 = 512 bits for the massage. On the other hand, using variable length technique,

                The number of bits required

                     =

                     =

                    

                                    

                     = 388 bits

                So, approximately 60% less bits are required if variable length coding is used than the fixed length coding. This is why, some data compression technique adopts Huffman coding scheme.

It does not require any end-of-character delimiter although the binary codes are of variable length codes. This is because of its unique prefix property. In a simple linear scan of a Huffman encoded bit string, one can easily isolate the next coded data value. For example, on receiving the bit string ‘00111110 01010’ scanner easily identifies for first two bits ‘00’ as S1, next three bits ‘111’ for S3, next five bits ‘11001’ as S9 and the last three bits ‘010’ as S4. Scanning the codes for the next symbol starts from root and ends on reaching the external node.

 

Disadvantages of Huffman Coding

Of course, the main disadvantage with Huffman coding is that they are of variable length codes. Processing time is required for both encoding and decoding the message. Another difficulty that may arise with these codes is that, if a bit gets changed due to noise then it may not recognize the remaining part of the massage properly.